library(tidyverse) # loads ggplot2, dplyr, and others
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggdist) # nice distributional plotslibrary(ggtext) # useful for text formattinglibrary(patchwork) # laying out sub-figureslibrary(RColorBrewer) # color paletteslibrary(palmerpenguins) # easy access to the penguins data
References
There are lots of ggplot2 tutorials on the web, but the ggplot2 website and book are definitive resources if you really want to understand how to use and extend ggplot2:
ggplot2 website – https://ggplot2.tidyverse.org/
ggplot2 book – https://ggplot2-book.org/
Data
# uncomment if you want o review the penguins data set in# the spreadsheet viewer# View(penguins)
Required layers: Data, geometry, and aesthetics
Every ggplot2 figure requires us to define three elements:
Data, in the form of a data frame
One or more geometric mappings that specify the plot type(s)
Aesthetics, specifying what gets mapped to the various aspects of the geometric mapping
Multiple ways to specify the layers
Aesthetics specified in ggplot call, inherited by all geoms:
# both geoms inherit the aestheticsggplot(penguins, aes(x = body_mass_g)) +geom_density() +geom_rug()
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_density()`).
Aesthetics specified in an independent call to aes:
# again aesthetics inherited by geoms that followggplot(penguins) +aes(x = body_mass_g) +geom_density() +geom_rug()
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_density()`).
Aesthetics specified in the geoms themselves:
# here each geom needs to redefine the aestheticsggplot(penguins) +geom_density(aes(x = body_mass_g)) +geom_rug(aes(x = body_mass_g))
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_density()`).
Here’s an example where it’s useful to specify specific aesthetics in a geom:
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
Optional aspects of ggplot figures
Specifying data, aesthetics, and geoms are required to create a plot. The following aspects are optional, but are often critical in terms of effectively exploring or depicting key features of data
Axis labels, titles,
The labs function is used to specify axis labels, titles, subtitles, etc.
better_plot <- my_plot +labs(x ="Body mass (g)",y ="Flipper length (mm)",title ="Penguin morphology",caption ="Data from palmerpenguins")better_plot
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
Changing plot limits
If you want to change the limits of a plot, the recommended way to do so is to use the coord_cartesian function to specify x and y limits.
# zoom in on a particular region of the plotbetter_plot +coord_cartesian(xlim =c(4000,5000), ylim=c(180,210))
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
# zoom out to include marginsbetter_plot +coord_cartesian(xlim =c(0,6500), ylim=c(0,240))
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
An alternate functions, lims can also be used to specify limits, but lims removes data rather than just changing the limits. This can be important if the plot includes a representation of a statistical function, such a linear model.
Themes
Themes change key aspects of the non-data related aspects of plots.
There are a variety of default themes:
better_plot +theme_classic()
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
better_plot +theme_minimal()
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
Color palettes
Color palettes can be used to specify the color used in plot.
The RColorBrewer package provides a number of well designed color schemes, but there are other packages that provide color palettes and you can specify your own palette as well.
The pre-definied RColorBrewer palette’s can be seen at this link. You can also view these palettes in your Quarto document as follows:
# the lines above are special Quarto related syntax for formatting figures# see the following link about figure options that can be set in quarto docs# https://quarto.org/docs/reference/formats/html.html#figuresdisplay.brewer.all()
Color and fill
Color and fill are the chief aesthetic properties that are effected by palettes and in geoms that use both, they can be set indepedently. Color modify functions in ggplot start with scale_color_ while fill modifying functions start with scale_fill_.
If we wish to use one of the RColorBrewer palettes to changes the color of points in a scatter plot, scale_color_brewer can be used:
# For categorical data I like the "Dark2" color schemebetter_plot +scale_color_brewer(palette ="Dark2")
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
Here’s another one of the RColorBrewer palettes.
better_plot +scale_color_brewer(palette ="Set1")
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
Defining your own color palettes
If you want to define your own palette you can do it as so:
# colors can be specified in various ways# ggplot recognizes both hex codes, SVG color names (https://johndecember.com/html/spec/colorsvg.html) # In this example we construct a custom palette using color namescustom_colors <-c("firebrick", "steelblue", "goldenrod")better_plot +scale_color_manual(values = custom_colors)
Warning: Removed 2 rows containing missing values or values outside the scale range
(`geom_point()`).
scale_fill_ examples
geom_density and geom_histogram have both color and fill aesthetics:
Warning: Removed 2 rows containing non-finite outside the scale range
(`stat_density()`).
Facets
Faceting is the process of subdividing our data based on one or more categorical variables when plotting.
# an example that involves filtering (via dplyr)# and facetting to compare conditional distributionspenguins |>filter(!is.na(sex)) |>ggplot(aes(x = body_mass_g, fill = species)) +geom_histogram() +facet_wrap(vars(sex, species), ncol =3) +labs(x ="Body Mass (g)", y ="# Observations",title ="Distribution of penguin body mass by sex and species",caption ="Data from Palmer Penguins library") + my_colors
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Organizing plots with patchwork
The patchwork library is useful for combining multiple plots into a single figure.
Patchwork uses a simple syntax for specifying the layout of plots, as illustrated below. See the patchwork docs for more details and lots of examples:
plot1 <-ggplot(penguins, aes(x = body_mass_g, fill = species)) +geom_histogram(position ="identity", # for non-stacked histogramsalpha =0.5) + my_colorsplot2 <-ggplot(penguins, aes(x = flipper_length_mm, fill = species)) +geom_histogram(position ="identity", alpha =0.5) + my_colorsplot3 <-ggplot(penguins, aes(x = body_mass_g, y = flipper_length_mm, color = species)) +geom_point() + my_colors# adding draws plots horizontally# dividing specifies vertical stackinglayout <- (plot1 + plot2) / (plot3)layout